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Global predictions of the secondary structure of coronavirus (CoV) 5’ untranslated regions and adjacent 
coding sequences revealed the presence of conserved structural elements. Stem loops (SL) 1, 2, 4, and 5 were 
predicted in all CoVs, while the core leader transcription-regulating sequence (L-TRS) forms SL3 in only some 
CoVs. SL5 in group I and II CoVs, with the exception of group IIa CoVs, is characterized by the presence of a 
large sequence insertion capable of forming hairpins with the conserved 5’-UUYCGU-3’ loop sequence. 


Structure probing confirmed the existence of these hairpins in the group I Human coronavirus-229E and the 
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group II Severe acute respiratory syndrome coronavirus (SARS-CoV). In general, the pattern of the 5’ cis-acting 
elements is highly related to the lineage of CoVs, including features of the conserved hairpins in SL5. The 
function of these conserved hairpins as a putative packaging signal is discussed. 


© 2010 Elsevier Inc. All rights reserved. 


Introduction 


The emergence of the Severe acute respiratory syndrome coronavirus 
(SARS-CoV) in 2003 has boosted related research and led to the 
discovery of many novel coronaviruses (CoVs) from different hosts 
such as equines, whales, birds, and bats; the latter species are 
considered as the potential reservoir of SARS-CoV (Guan et al., 2003, 
Ksiazek et al., 2003; Li et al., 2005; Marra et al., 2003; Mihindukulasur- 
iya et al., 2008; Woo et al., 2007, 2009; Zhang et al., 2007). In the past 
few years, also two novel human CoVs, NL63 and HKU1, have been 
identified causing rather severe symptoms in infants and the elderly 
(van der Hoek et al., 2004; Woo et al., 2005). The discovery of so many 
novel CoVs calls for a better understanding of the phylogeny of CoVs. 

Based on serological patterns and genome organization, the genus 
Coronavirus has been classified into three major groups: group J, II and III 
(Lai and Cavanagh, 1997; Brian and Baric, 2005). More recently, these 
groups have been further subdivided into, in total, 9 subgroups, based 
upon amino acid similarity of structural and non-structural proteins 
(nsp) (Snijder et al., 2003; Woo et al., 2006, 2007; Woo et al., 2006, 
2007). However, other studies propose at least 5 distinct lineages (Tang 
et al., 2006; Dong et al., 2007; Vijaykrishna et al., 2007), and even for 
SARS-CoV there is discussion whether it represents a separate lineage 
(Rota et al., 2003) or is an early split-off of group II CoVs (Snijder et al., 
2003; Gibbs et al., 2004). Thus, in addition to the conventional pair-wise 
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comparison of viral protein sequences, other genetic or structural 
features may be helpful in the classification of CoVs. 

In the genome of CoVs, like that of most RNA viruses, the 5’ and 
3’ untranslated regions (UTRs) usually harbor important structural 
elements which are involved in replication and/or translation (Chang 
et al., 1994; Raman et al., 2003; Raman and Brian, 2005; Goebel et al., 
2007; Ziist et al., 2008; Liu et al., 2009). In Mouse hepatitis virus 
(MHV), a group II CoV, a bulged stem-loop and a pseudoknot 
structure were identified in the 3’ UTR (Goebel et al., 2004a). Similar 
pseudoknot structures were found in other group | and II CoVs, 
showing structural conservations of the CoV 3’ UTR (Goebel et al., 
2004a). However, the 3’ UTR of MHV could be functionally replaced by 
the 3’ UTR of group II SARS-CoV but not by that of the group | 
Transmissible gastroenteritis virus (TGEV) or the group III Avian 
infectious bronchitis virus (IBV), indicating certain group-specific 
functions for the 3’ UTR (Goebel et al., 2004b). 

In this study the secondary structures of the 5’ UTRs and the 5’- 
proximal sequences of the ORFlab gene in all known CoVs were 
predicted. The structural features of this region turned out to reflect 
the known grouping of CoVs, which is based on amino acid similarity. 
The unique and conserved features were further investigated in detail. 


Results and discussion 


The clustering of the 5’-proximal sequence of CoV RNAs shows 
group specificity 


The clustering of the CoV 5/’-proximal 420 nucleotides (nts) 
obtained from the Kalign webserver (see Materials and methods) 
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AUG of Accession 


Clustering of coronavirus Group TRS-L uORF ORFiab number 
_ Oil fF 4 : tir ¢°@@ & & fC 
TGEV/Purdue la 94 117-128 315 NC 002306 
TGEV/TS la 94 117-128 315 DQ201447 
FIPV/79-1146 la 94 117-128 312 NC_007025 
FCoV/Black la 93 116-127 312 EU186072 
FCoV/ClJe la 92 115-126 311 DQ848678 
HCoV-229E/inf-1 Ib 66 86-121,102-116 293 NC 002645 
HCoV-NL63/AMS-I Ib 66 101-118 287 NC_005831 
HCoV-NL63/AMS-496 Ib 66 101-118 287 DQ445912 
BtCoV-HKU2 Ib 69  98-118,119-130 297 NC 009988 
BtCoV-1A Ic 62 87-104,146-181 272 NC_010437 
BtCoV-1B Ic 63 88-105,147-182 273 NC_010436 
BtCoV-HKU8 Ic 63  88-105,147-182 270 NC_010438 
PEDV/CV777 Id 67 99-137 297 NC 003436 
— ~ PEDV/LZC Id 67 99-179 297 EF185992 
BtCoV-512/2005 Id 70 97-135 294 NC_009657 
IBV/Beaudette Illa 57 131-166 529 NC_001451 
IBV/Beaudette-p65 lila 57 131-166 529 DQ001339 
IBV/Peafowl Illa 57 131-166 529 AY641576 
TCoV/MG10 Illa 57 131-163 529 NC_010800 
CoV-SW1 lllb 72 - 524 NC_010646 
BCoV lla 65 100-126 211 NC 003045 
HEV lla 65 100-126 211 NC_007732 
HCoV-0C43 lla 65 100-126 211 NC_005147 
HCoV-HKU1 lla 66 98-118 206 NC_006577 
MHV-A59 lla 66 99-125 210 NC_001846 
SARS-CoV/Tor2 lib 67 104-136 265 NC 004718 
SARS-CoV/TJF lib 66 103-135 264 AY654624 
BtSCoV-Rp3 IIb 67 104-136 265 NC_009693 
BtSCoV-HKU3-1 lib 65 102-134 262 NC_009694 
BtSCoV-279/2005 lib 65 104-136 262 DQ648857 
: BtSCoV-Rm1 lib 66 104-133 261 NC_009696 
| BtSCoV-273/2005 lib 67 104-133 261 DQ648856 
| BtCoV-HKU9-4 lid 71 . 229 EF065516 
BtCoV-HKU9-2 lid 69 - 228 EF065514 
BtCoV-HKU9-3 lid 70 - 229 EF065515 
BtCoV-HKU9-1 lid 71 - 229 NC_009021 
BtCoV-HKU4-1 llc 63 140-163 267 NC_009019 
BtCoV-HKU4-4 llc 63 140-163 267 EF065508 
BtCoV-133/2005 lic 56 133-156 260 NC_008315 
BtCoV-HKU5-1 llc 61 141-164 261 NC 009020 
BtCoV-HKU5-2 llc 61 140-163 260 EF065510 
BuCoV-HKU11/796 IIIc 72 - 607 NC_011548 
ThCoV-HKU12/600 Illc 65 - 592 NC_011549 
MuCoV-HKU13/3514 _Illd 64 - 595 NC_011550 


Fig. 1. Clustering and general features of the 5’ 420 nucleotides of CoVs. The tree is based on a multiple sequence alignment using ClustalW2 at the European Bioinformatics Institute 
webserver. The phylogenetic group, the start of core TRS-L, the region of upstream ORF (uORF), the start of ORFlab, and GenBank accession number of each CoV are listed. 


basically resembled the current grouping system for CoVs (Fig. 1), 
though group I CoVs may be further subdivided into 4 subgroups, 
groups la to Id, according to their relatively large phylogenetic dis- 
tances (Fig. 1). Sequence comparison further showed conserved and 
unique features for each CoV group, including: (i) the relative location 
of the core sequence of the leader transcription-regulating sequence 
(L-TRS) is quite conserved in all CoVs, except for the one in group 
Ia CoVs which has a rather long leader sequence upstream of the 
core TRS; (ii) the potentially translatable short ORF upstream of the 
genomic ORF1ab, the uORF, is present in most CoVs except for group 
IId, IIIb, IIc, and IIId CoVs; (iii) the 5’ UTR in group III CoVs is sub- 
stantially longer than that in group I and II CoVs, while group IIa CoVs 


have an exclusively short 5’ UTR (Fig. 1). It has to be noted that in 
order to obtain a higher threshold of the phylogenetic distance, strains 
with the highest sequence variation were used for analysis (selected 
from the genomic sequences of all CoVs available in GenBank). This 
made it more promising if homology was found within a cluster. To 
further examine if particular features found in the RNA sequence in 
each group are relevant to specific organization of the 5’ cis-acting 
elements, we globally predicted the secondary structures of the CoV 5’ 
UTRs, predominantly using computational calculations at the mfold 
webserver (Zuker, 2003). We have identified several conserved stem- 
loop (SL) structures in this region, some of which are organized in a 
group-specific manner (see Figs. 2, 3, and 4). 


Fig. 2. The structural-phylogenetic analysis of the 5’-proximal sequences in group I CoVs. The predicted secondary structures of the 5’-proximal sequence of (A) group Ia TGEV- 
purdue, (B) group Ib HCoV-229E-inf-1, (C) group Ic PEDV-CV777, and (D) group Id BtCoV-1A coronaviruses are shown. Nucleotide variations located in the conserved elements in 
the other representative CoVs of each subgroup are indicated. The start codon of the ORFlab is boxed, the core sequence of the transcription-regulating leader (TRS-L CS) is 


bracketed, and the length of the sequence insertion in SL5 is indicated. 
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Fig. 2 (continued). 


The universal presence of SL1 and SL2 in CoV 5’ UTR 


The very 5’ nts of CoV RNAs fold into a hairpin of low thermo- 
dynamic stability, SL1, which is supported by many co-variations 
(Figs. 2-4), particularly in group IIa and IIIc CoVs. The loop sequences 
are not strongly conserved although a YRYR tetra-loop seems to be 
preferred in most SL1is. A general feature of SL1 is the presence of 
mismatches, bulges (e.g. in group! and II CoV RNAs) and a high number 
of A-U and U-A base pairs (bps) (e.g. in group IIIa, b, and d CoV RNAs). 


Recent data by Li et al. (2008) suggest that the low thermodynamic 
Stability of SL1 is important for the replication of MHV. 

Another conserved hairpin is SL2 which consists of a 5-bp stem and 
a highly conserved loop sequence, 5’-CUUGY-3’, which has an impor- 
tant role in MHV replication (Liu et al., 2007), though the motif is less 
conserved in SL2 of group! and III CoVs (Figs. 2 and 4). Downstream of 
SL2, an additional hairpin, SL2.1, with the stable UUCG tetra-loop, was 
predicted in group Ia CoVs. Interestingly, the CUUGY loop was recently 
shown to adopt the YNMG-type of tetra-loop-folds (Liu et al., 2009). 
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The diversity of SL3 and SL4 in CoVs 


Previously, the core L-TRS in CoVs has either been proposed to be 
non-structured (Stirrups et al., 2000; Wang and Zhang, 2000) or to 
form a hairpin structure (Shieh et al., 1987; Chang et al., 1996). We 
found that the core L-TRS and the adjacent sequence may fold into SL3 
in some CoVs, e.g. the group II Bovine coronavirus (BCoV), SARS-CoV 


and Bat coronavirus HKU4 (BatCoV-HKU4), and the group III 
coronavirus SW1 (CoV-SW1), Bulbul coronavirus HKU11 (BuCoV- 
HKU11), and Munia coronavirus HKU13 (MuCoV-HKU13) (Figs. 3 
and 4). However, the sequence variations found in group Ila CoVs are 
partially in conflict with the lower part of SL3, while in other CoVs 
there are no co-variations to support the formation of SL3. Thus, the 
CoV SL3 may not structurally resemble the L-TRS Hairpin (LTH) found 


Cn 6 or 
U G 
U eG— A’ 
abC<— U-A 
G-C—vu> 
UeG—Ae 
bo<— A 
A) Group Ila : 
4 -8 
BCoV (8 nt) . - Gus 
e a 
a:MHV-A59 _yc™ C ) U, aes —C 
b:HCoV-HKU1 ag Mi ae 
c:HEV-VW572 ae CAA ak U a> Ge 
d:HCoV-0C43 G-C—>u> bU<— GeuU A A 
UeG—CA4,Ue 3A<— G-C—vU? A G 
A-U abC<— U-A—>G Geu 
abo <— U-A—G G -—- C_» AU? A-U 
C- : A c-G 
-_ a -_ 
APUS Cc UAC» TRS-L CS bdep <— ano a ac bG<— U-A—>Ga.cb UA 
. U Gc? v ablj<— A-U C -IG C—> Uabede G onli 
UCC %_G @U=> A? aG<— UsU—>AY °C,2eU<— G@U—>Gee, AP Cc abdG<-A AU 
an. C-G—>C AA aC<— GeU—GaA> 'C<— G-C—UP A—>G> aG<U U 
C<=G-C—Ub uO aG<—_U*U " abc<— A G —>Aa,UP G-C—>u> %G@<A G 
ba <~U-A—>C® ¢ 6C¢ auc C aG<u A G-C yc 
abG “ UeU—> AC UU Cc UA abG<— C G-cC beG<— A-U—>Cabe 
eA <— UeC—>AP b,a PA A-U ay<—-C U bU<— C-G—>A? UU G-C 
Ae GU uG £4 RA-U U-A U- ae bG<— C G-C 
G-C Ui Ue Ge. U-A—>UP bU<— C-G—>A®  *U,2G<— A-U—>CP A A—>G@ — aA<— Y-A—>U® 
<—C-G—>A®  C U “~G%AN%a_y : -U apbA<— U aba cU<— C-G—> Aau> G-C 
G-C™~ aa U-A = abcde ie CoA 3g, , Ge C- GP aC<— G-C-—>G? C'eG< A-U— Ga >bY< G-C—U» 
A-U>ys “Ac- ®A<—U-A ap GA C-G—>C2U2  *C<— U- AG? U-A— Ge A A-—->G? 
Geu—>C# .\u-A—>u2 m%A<—U-A ASE U-A—>G aby<—C-G— Ad G-C 
ap bede|y <— C — Ua Abd A A C<U-A—cC® Geap-_y a. abec<— A -—U — Gabe A-U 
aa,by <— G-C—>pab G-C—>u? U-A—G8 C-G U-A abG<— A-U Cc-G 


SS]. SS. _:.:2k.) Fcc =|]. —:.-.c e:afkf |= = FS -z 


5 35 39 54 55 80 81 131 174 226 239 310 311 340 
SL1 SL2 SL3 SL4 SL5 
Co 
B) Group IIb 
U-A 
SARS-CoV-Tor2 P we sa Uo 
a:BtSCoV-Rp3 GA UA 
b:BtSCoV-Rm1 ate &. fy 
c:BtSCoV-273/2005 aoe a 
d:BtSCoV-HKU3-1 sue Y Gc 
e:SARS-CoV-TJF C_-G ; U-A 
f:BtCoV-279/205 ecg & C 
U Cc = 
G cbG — A-U 
tf, ae 
cc - 
A Cc idimeccog °- Wie kwon 
U-A LA A-U 
C-G Ge G-C 
C-G U eG—> are 'G<—A-Y, G 
oe o-G & GR 
U-k TRS-LCS GA {U<— A-U>> A? U-A 
U-A ou A ; Ace, eae te” (0 U-A 
UeG C6 Unye¢ ‘A *A-U Gc MUA GU at beUe—C_G—> ate 
G-C U-A uU Cc AA T=" G-C A G 
G-C C-G C-G eT U-A—_ eu U-A G-C 
A U-A->G° U-A U-A G-C— us a: 6G —>ust 
U-A A-U T= U-A C-G b C-G G-C—U 
U-A G-C G-c iach U-A cA<— Geu Geu 
SE ne ne eee 
4 33 42 56 58 72 fa 130 149 293 318 332 359 380 
SL1 SL2 SL3 SL4 SL5 


Fig. 3. The structural-phylogenetic analysis of the 5’-proximal sequences in group II CoVs. The predicted secondary structures of the 5’-proximal sequence of (A) group IIa BCoV, 
(B) group IIb SARS-CoV-Tor2, (C) group IIc BtCoV-HKU5-1, and (D) group IId BtCoV-HKU9-1 are shown. For details see Fig. 2. 
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Fig. 3 (continued). 


in the related arterivirus, the Equine arteritis virus (EAV), which directs 
discontinuous transcription (van den Born et al., 2004, 2005). In some 
other CoVs, e.g. TGEV and the Human coronavirus-229E (HCoV-229E), 
the core L-TRS was predicted to participate in the stem of SL4 (Figs. 2A 
and B), although sequence variations found in group Ib CoVs do not 
strongly support the involvement of the core L-TRS in the SL4 stem 
(Fig. 2B). All in all, based on the structural-phylogenetic survey, it can 
be concluded that the core L-TRS and the flanking sequences are 
poorly structured in CoVs. 

Downstream of the L-TRS, a long hairpin, SL4, was predicted for all 
CoVs (Figs. 2, 3, and 4). The presence of a large number of co- 


variations seems to support the existence of SL4 strongly, particularly 
the upper half of this structure. Raman et al. (2003) have shown that 
the structural integrity, in positive or negative strands or both, of the 
upper part of SL4 (the SL-III in their study) is important for replication 
of BCoV DI RNA. We also found that the uORF predominantly 
terminates within the SL4 (data not shown), even for those uORFs 
that are in-frame with the downstream ORF1ab (Fig. 1). 

There has no direct evidence for the translation of uORF in CoV 
infected cells, although Raman et al. (2003) have suggested a positive 
correlation between maintenance of the uORF and maximal BCoV DI 
RNA accumulation. They have also shown that a DI RNA in which this 
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Fig. 4. The structural-phylogenetic analysis of the 5’-proximal sequences in group III CoVs. The predicted secondary structures of the 5’-proximal sequence of (A) group IIIa IBV- 
Beaudette, (B) group IIIb CoV-SW1, (C) group IIIc BuCoV-HKU11/796, and (D) group IIld HKU13/3514 are shown. For further details see Fig. 2. 
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Fig. 5. The substructural hairpins of SL5 in group I and II CoVs. The secondary structure of the SL5 substructural hairpins, SL5a-c, in (A) group Ila TGEV-purdue, (B) group Ib HCoV- 
229E-inf-1, (C) group Ic PEDV-CV777, (D) group Id BtCoV-1A, (E) group IIa BCoV, (F) group IIb SARS-CoV-Tor2, (G) group IIc BtCoV-HKU5-1, and (H) group IId BtCoV-HKU9-1 are 
shown. The start codon of the BtCoV-HKU5-1 ORF1ab is located in SL5b as indicated. SL5.1 which is located upstream of SL5 in BtCoV-HKU9-1 also contains the conserved UUUCGU 
motif. 
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uORF was replaced by a totally unrelated uORF could be replicated. 
Our phylogenetic analysis showed that the sequence variations 
located in SL4, which were found to maintain the integrity of the 
RNA secondary structure, are not always silent at the amino acid 
level (data not shown). Although features of uUORFs seem to be con- 
served and group-specific (Fig. 1), the necessity of translation of 
this ORF needs to be determined in the future to understand why 
certain groups of CoVs do need uORF for their propagation and others 
do not. 

We noticed that the sequence of SL4 is included in the hotspot of 
the 5’-proximal genomic acceptor (Wu et al., 2006), suggesting that 
SL4 may play a role in directing the subgenomic RNA synthesis and 
thereby compensates for the absence of a structured L-TRS hairpin 
(see above). 


Features of the inserted sequence in SL5 reflect the lineage of CoVs 


A fifth structural element, SL5, was predicted downstream of SL4 in 
all CoVs (Figs. 2-4). SL5 is a homologue of SL-IV of BCoV reported by 
Brian and coworkers (Raman and Brian, 2005; Brown et al., 2007) and 
is supported by co-variations in almost all CoV groups with the 
exception of group la, and Illa, b, and d CoVs, where sequence 
variation is low. Compared to group IIa and III CoVs, the other CoVs 
have sequence insertions in the top of SL5, which are about 110-nt 
long in group I CoVs and between 55 and 94 nt in group IIb, c, and d 
CoVs (Figs. 2, 3, and 4). Secondary structure predictions of these 
inserts revealed hairpins displaying the conserved 5’-UUYCGU-3’ loop 
motif (Fig. 5). We note that some of these hairpins resemble the 
predicted structures for four group I CoVs and SARS-CoV reported by 
Raman and Brian (2005), which were proposed to be homologues of 
BCoV SL5 (SL-IV in their report). Nevertheless, our comprehensive 
structural-phylogenetic analysis indicates that these conserved 
structural motifs are not SL5 homologues as such but are substructural 
hairpins within SL5 (Figs. 2, 3, and 5). 

In group I CoVs, a large number of co-variations, particularly in 
group Ib CoVs, was observed, supporting the existence of these 
substructural hairpins at the top of SL5 (Fig. 5). We noticed that 4 
different patterns of the SL5 substructural hairpins were found in 
group I CoVs. This finding supports the idea that group I CoVs may be 
clustered into 4 subgroups, groups Ia to d. Nonetheless, the structural 
homology of SL5 within the lineage of the group | CoVs is still higher 
than that of the group II CoVs; three hairpins, SL5a, b, and c, with 
mainly the conserved 5’-UUCCGU-3’ loop sequence, were found in all 
group I CoVs. This is in agreement with the shorter phylogenetic 
distances found between each subgroup (group Ja—d) in group I CoVs 
compared to group II CoVs, which feature more diverse sequence 
insertions, in terms of length, the presence of 5’-UUYCGU-3’ motifs, 
and secondary structure. The greater structural variation in SL5 of 
group II CoVs is as follows: (i) the substructural hairpins are replaced 
by an 8-nt sequence in group IIa CoVs (Fig. 5E); (ii) one of the three 
substructural hairpins in SL5, SL5c, contains a GNRA tetra-loop 
Sequence (group IIb) or a non-conserved hepta-loop sequence 
(group IIc) but not the UUYCGU motif (Figs. 5F and G); (iii) only 
two substructural hairpins are folded on top of SL5 in group IId CoVs, 
yet an additional conserved UUCCGU motif is present in SL5.1 located 
further upstream, in between the L-TRS and SL4 (Figs. 3D and 5H). 
Thus, the pattern of the SL5 substructures is strongly related to the 
lineage and the phylogenetic distance of the group | and II CoVs. 

Similar hairpins with a conserved loop motif could not be iden- 
tified in group III CoVs (Fig. 4). Here, SL5 has a rod-like shape as in 
group Ila. Also in the remainder of the 5’ UTR of group III CoVs no 


hairpins could be identified that featured a UUYCGU sequence or 
another motif. 


Structure probing of the SL5 substructure in HCoV-229E and SARS-CoV 


To verify the secondary structures of the proposed substructural 
hairpins in group I and II CoVs, the corresponding RNA transcripts of 
HCoV-229E and SARS-CoV were subjected to enzymatic and chemical 
structure probing (see Materials and methods). Clearly, the single- 
stranded 5’-UUUCGU-3’-hexa-loop sequences in HCoV-229E SL5a, 
SL5b, and SL5c can be recognized by the single-strand specific probes, 
DMS, RNase A, S1 nuclease and/or RNase T1 (Fig. 6A), suggesting that 
these nucleotides are unpaired. The presence of RNase V1 cuts, an 
enzyme that cuts double-stranded RNA, in the predicted stem regions 
is also in agreement with the model. Probing results of SARS-CoV were 
also in agreement with the existence of SL5a, b, and c (Fig. 6B). 

Notably, the U:U mismatches located in the stems of these 
substructures seem to form non-canonical base pairs since RNAse 
V1 recognized U222 and U221 in HCoV-229E SL5b, as well as U193 in 
SARS-CoV SL5a. In fact, several (tandem) U:U mismatches were 
identified in the SL5 substructural hairpins, e.g. the SL5a in the group 
Ic Porcine epidemic diarrhea virus (PEDV) (Fig. 5C) and the group Id Bat 
coronavirus 1A (BtCoV-1A) (Fig. 5D), as well as in other 5’ cis-acting 
elements, e.g. the MHV SL1 (Li et al., 2008). Interestingly, co-variations 
were frequently found at the positions of these tandem U:U 
mismatches, e.g. SL4 (Figs. 2B, 3C, 4C) and SL5a-c (Figs. 5B, C, D, 
and H). This suggests the formation of (tandem) U:U base pairs similar 
to what has been reported for the 5’-CU-3’/5’-UU-3’ non-canonical 
base pairs found in the Y stem of polio-like enterovirus 3’ UTRs 
(Lescrinier et al., 2003). 


Are the SL5 substructural hairpins the counterparts of the group 
Ila packaging signal? 


It has been generally found that a strong packaging signal (PS) or 
encapsidation signal, which directs specific packaging or encapsida- 
tion of genomic RNA, usually encompasses repetitions of conserved 
(structural) motifs (Hellendoorn et al., 1996, Chen et al., 2007). This 
leads us to propose that the SL5 substructures bearing the highly 
conserved UUYCGU repeats function as genomic PS for group I and II 
CoVs, including SARS-CoV. 

Studies of the genomic PSs in CoVs have been mainly focused on 
group Ila CoVs in the past, e.g. MHV and BCoV (Fosmire et al., 1992; 
Makino et al., 1990; van der Most et al., 1991; Woo et al., 1997; Chen 
et al., 2007; Cologna and Hogue, 2000). For other groups of CoVs, e.g. 
SARS-CoV, the identification of a putative PS has been reported by 
Hsieh et al. (2005). This PS was thought to be a homologue of the MHV 
PS located in the corresponding region near the 3’ end of ORF1ab. 
However, it has to be noted that the specificity of the proposed SARS- 
CoV PS to direct RNA packaging was not determined in their study, 
and the predicted secondary structure of their “homologue of MHV 
PS” lacks the conserved features of the MHV PS structure reported by 
Chen et al. (2007). Also we doubt the possibility of identifying a MHV- 
like PS in the “corresponding region” of SARS-CoV genomic RNA 
because an alignment of nsp15 sequences clearly shows that the 
sequence corresponding to the MHV PS is absent in SARS and other 
non-group Ila CoVs (Fig. 7). 

Interestingly, the presence or absence of the region corresponding 
to the group IIa PS may not interfere with the function of nsp15 as the 
functional domains remain intact in both MHV and SARS-CoV nsp15 
(Joseph et al., 2007). There seems to be however a strong correlation 


Fig. 6. Structure probing of the inserted sequences in SL5 of group Ib HCoV-229E and group IIb SARS-CoV-Tor2. The secondary structures of the SL5 substructural hairpins of (A) the 
HCoV-229E and (B) the SARS-CoV are analyzed by enzymatic and chemical structure probing. Annotation of the denaturing electrophoresis: Un, untreated; D, DMS treated; R, RNase 
A treated; T,, RNase T,; treated; V;, RNase V, treated; S;, S; nuclease treated; G, U, C and A, the RNA sequencing ladder. 
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Gla/TGEV 

GI b/HCoV-229E 
GI c/PEDV/1-66 
GId/BtCoV-1A 
Gl 1la/MHV-A5S9 6661 -LNGVVVEKVGDS 
GIla/BCoV 6577 -LNGVVVDKVGDT 
Glla/HCoV-HKU1 6665-LNGVIVDKVGEL 
Gl la/HEV 6577 -LNGVVVDKVGDT 
Glla/HCoV-0C43 6577-LNGVVVDKVGDT 
Glla/BCoV 


\TSTIFTOSRVISSFICRTDNEKDF IALDODVEIOKYG- 6675 
ARFT IFTQSRVLSTFEPRSDLERDF IDMEDSLF I AKYG-6763 
ATST IFTQSRV ISSFTCRTDMEKDF IALDQDVF I QKYG - 6676 
ISTIFTQSRVISSFTCRTDMEKDF TALDQDVF I QKYG-6676 
ATST IFTQSRVISSFTCRTDMEKDF TALDQDVF 1QKYG-6675 


GIIb/SARS-CoV 6591-VNGVTILIGE-SV----- KTQFNYFKKVDG- - ----------- I 1QQLPET once eee eeeeeeee------ YFTQSRDLEDFKPRSQMETDFLELAMDEF IQRYK-6655 
GlIc/BtCoV-HKU4 6631-FNGAILRNIDAK- - - -QPVIFYLYKKVNN- ------------ EFVSFSDT- -------------------- FYTCGRTVGDFTVLTPMEEDFLVLDSDVF IKKYG-6697 
GI1d/HKU9 6451-INGVVVEAP-DR-- - -- GTAFWYAMRKDG- - ----------- AFVOPTDG- -------------------- YFTQSRTVDDFQPRTQLE I DFLDLEQSCFLDKYD-6515 
Gllla/IBV 6139-SNLLIQNGMPLK- - - -DGANLYVYKRSNG- - ----------- AFVILPIT--------------------- LNTQGRNYETFEPRSDVERDFLDMSEDDFVEK YG - 6206 
GIIIb/SW1 5882 -LNALNLPGCNGGSLYVNKHAFHTEKYDRS- - - - - -------- AFRNLKSMP- - - ----------------- FFFFDDSPCDVKLVNDVAQDLVALSARDCITRCN- 5953 
GIIIc/HKU11 5803 -CTALTLNG- - IAI - - -DGDELYIYYRKDN- - ----------- QIVNFTIT---------------------- LTQGRSVDKF I TKTPMEKDFLEMSPEDF I TNYQ- 5867 
GIIId/HKU13 5846 -CFALLLHSMALAI - - -DGQELY IYKRLNG- - ----------- QLVSIDTI---------------------- CTQGRSVDKF I PETPMERDFLEKSSEEF INLYQ- 5912 


Fig. 7. Multiple alignment of the CoV nsp15 sequence corresponding to the group Ila packaging signal. The amino acid sequences of the group IIa CoV nsp15 are aligned with the 
sequences of other CoV groups, showing the underlined sequence insertion of the packaging signal corresponding region in group Ila CoVs. 


between the lack of a MHV PS-corresponding region and the presence 
of SL5 substructures and vice versa (Fig. 5). This correlation strongly 
suggests that the SL5 substructural hairpins located in the 5’ UTR are 
the counterparts of the genomic PS present in group Ila CoVs, and 
presumably the UUCCGU structural repeats (Fig. 5A) are responsible 
for the packaging activity reported by Escors et al. (2003) for the first 
649 nts of TGEV genomic RNA. 


Conclusions 


The diversity of the genomic RNA sequence provides a wealth of 
structural and phylogenetic information on the lineage of CoVs and 
improves our understanding of the evolution of the 5’ cis-acting 
elements. We have shown that the pattern of these cis-acting 
elements in the 5’ UTR is highly related to the phylogenetic distance 
based on the viral protein sequences, suggesting that the viral 
proteins and the RNA sequence evolved simultaneously, possibly to 
maintain functional RNA-protein interactions. 

The unique and conserved features of the 5’ UTR and SL5 highlight 
the role of RNA structure in the evolution of CoVs and may serve as a 
roadmap for further studies. Future experiments should also verify 
whether the conserved UUYCGU motifs in SL5 function as PS in group I 
and II CoVs by interacting with nucleocapsid and/or membrane pro- 
teins (Molenkamp and Spaan, 1997; Narayanan and Makino, 2001; 
Narayanan et al., 2003). The absence of these or other conserved 
motifs in the 5’ UTR of group III CoVs suggests that their PSs are 
located elsewhere in the genome. This possibility is currently being 
explored. 


Materials and methods 
Structural-phylogenetic analysis 


Multiple alignment of all CoV 5’-proximal sequences available in 
GenBank was used to select coronaviruses with the highest sequence 
diversity. Sequences of the 5’ 420 nts of these variants were clustered 
by ClustalW2 on EBI webserver (Larkin et al., 2007). Secondary struc- 
tures of this region were predicted by the Mfold webserver (Zuker, 
2003). The alignment of CoV nsp15 was done by Kalign webserver 
(Lassmann and Sonnhammer, 2006) (Fig. 7). 


Structure probing and primer extension 


The RNA transcripts encompassing the entire HCoV-229E and 
SARS-CoV SL5 region (about 180 nt) were synthesized in vitro using 
Ribomax'™ RNA production system (Promega). The corresponding 
cDNA templates with an upstream T7 promoter were amplified by 


PCR using oligo-nucleotides 5’-TAATACGACTCACTATAGGGCATGCC- 
TAGTGCACCTACGCAG-3’ (the T7 promoter sequence is underlined) 
and 5’-CAAACTGAGTITGGACGTGTG-3’ for SARS-CoV SL5 and oligo- 
nucleotides 5’-TAATACGACTCACTATAGGGTAATTGAAATTICATTIG- 
GG-3’ (the T7 promoter sequence is underlined) and 5’-GITGTGACAC- 
TIGCCGTAGC-3’ for HCoV-229E SL5. Purified RNA transcripts were 
subjected to chemical and enzymatic probing as described in Chen 
et al. (2007). In general, 0.001% dimethylsulfate (DMS), 1 pg Rnase A, 
0.001 U RNase T1, 0.1 U RNase V1, and 0.8 U S1 nuclease were used for 
the probing reactions (1x), followed by serial dilutions with a factor 
1/5 (1/5x and 1/25x) or 1/8 (1/8x and 1/64x). The primer 
extension was carried out with 0.01 pg of treated transcripts, 0.5 pl 
of a 0.1 mM concentration of the MHV1 primer, 1 pl of 5 mM dGAT, 
1 pl of 25 pM dCTP, 0.1 pl of a-77P-labeled dCTP (10 mCi/ml), 1 pl of 
5x reverse transcriptase buffer, and 20 U of Moloney murine leukemia 
virus reverse transcriptase (Promega). 
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